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^ ^ ' Abstract. Let Xi, . . . , Xn be i.i.d. copies of a random variable X = Y + Z, 

Oi ' where Xi = Yi + Zi, and Yi and Zi are independent and have the same 

Q^ ^ distribution as Y and Z, respectively. Assume that the random variables YiS 

U^ , are unobservable and that Y = AV, where A and V are independent, A has 

a Bernoulli distribution with probability of success equal to 1 — p and V has 
a distribution function F with density /. Let the random variable Z have a 
known distribution with density k. Based on a sample Xi, . . . , X„, we consider 
the problem of nonparametric estimation of the density / and the probability 
p. Our estimators of / and p are constructed via Fourier inversion and kernel 
smoothing. We derive their convergence rates over suitable functional classes. 
By establishing in a number of cases the lower bounds for estimation of / and 
r^ , p we show that our estimators are rate-optimal in these cases. 
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1. Introduction 

cn 

^ ' Let Xi,. . . ,Xn be i.i.d. copies of a random variable X =^ Y + Z, where Xi — 

^O , Yi -\- Zi, and Yi and Zi are independent and have the same distribution as Y and 

^^ ' Z, respectively. Assume that the random variables Y^'s are unobservable and that 

Y = AV, where A and V are independent, A has a Bernoulli distribution with 
probability of success equal to 1 — p and V has a distribution function F with 
density /. Furthermore, let the random variable Z have a known distribution with 
density k. Based on a sample Xi , . . . , Xn, we consider the problem of nonparametric 
estimation of the density f and t he probability p. This problem has been recently 



o 
o 



introduce d in van Es et al.l ( 20081 ) for the case when Z is normally distributed and 



Lee et al.l ( 20101 ) for the class of more general error distributions. It is referred 



j^ ' to as deconvolution for an atomic distribution, which reflects the fact that the 

H I distribution of Y has an atom of size p at zero and that we have to reconstruct 

('deconvolve') p and / from the observations from the convolution structure X — 
Y + Z. When p is known to be equal to zero, i.e. when Y has a density, the problem 



reduce s to the classical and much studied deconvolution problem, see e.g. iMeister 
( 2009[ ) for an introduction to the latter and many recent references. 



The above problem arises in a number of practical situations. For instance, 
suppose that a measurement device is used to measure some quantity of interest. 
Let it have a probability of failure to detect this quantity equal to p, in which case it 
renders zero. Repetitive measurements of the quantity of interest can be modelled 
by random variables Yi defined as above. Assume that our goal is to estimate the 
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density / and the probability of failure p. If we could use the measurements Yi 
directly, then when estimating /, zero measurements could be discarded and we 
could use the nonzero observations to base our estimator of / on. The probability 
p could be estimated by the proportion of zero observations. However, in practice 
it is often the case that some measurement error is present. This can be modelled 
by random variables Zi and assuming the additive measurement error structure, 
in such a case the observations are Xi = Yi + Zi. Now notice that due to the 
measurement error, the zero Yi's cannot be distinguished from the nonzero 1^'s. If 
we do not want to impose parametric assumptions on /, the use of nonparametric 
deconvolution techniques will be unavoidable when estimating /. 



An other example comes from evolutionary biology, see Section 4 in iLee et al.l 
( 2010l ): suppose that a virus lineage is grown in a lab for a number of days in a 



manner that promotes accumulation of mutations. Plaque size can be used as a 
measure of viral fitness. Assume that it is measured every day and let the mutation 
effect on viral fitness be defined as a change in plaque size. If a high fitness virus is 
used, during any time interval in terms of mutations there are only two possibilities: 
either 1) no mutation, or only silent mutation occurs, or 2) a deleterious mutation 
occurs. Due to the fact that a silent mutation does not affect fitness, theoretically 
it will not change the plaque size and hence the mutation effect is zero for the first 
case. Deleterious mutations on the other hand will affect the plaque size. Since the 
distribution of deleterious mutation effects is usually considered to be continuous, 
the distribution of mutation effects can be expressed as a mixture of a point mass 
at zero, which corresponds to scenario 1), and a continuous distribution, which 
corresponds to scenario 2). Presence of measurement errors (which can be assumed 
to be additive) when measuring the plaque size leads precisely to the deconvolution 
problem for an atomic distribution. 

Deconvolution for an atomic distribution is also closely related to empirical Bayes 
estimation of a mean o f a high-dimensional normally distributed vector, see e.g. 
Jiang and Zhangl (20091) for the description of the problem and many references. 



In more detail, let Xi '^ N{6i,l),i = l,...,n be i.i.d., where N{Oi,l) denotes 
the normal distribution with mean 9i and variance 1, and suppose that based on 
Xi, . . . ,Xn the goal is to estimate the mean vector 9 ~ (6'i, . . . ,0„). This has 
applications e.g. in denoising a noisy signal or image. It is often the case that 
the vector 9 is sparse in some sense in that many of 9i's are zero or close to zero. 
The notion of sparsity can be naturally modelled in a Bayesian way by putting 
independent priors Ili{dx) = plix^Q]dx + (1 — p)F{dx) on each component 9i of 0, 
where < p < 1 and _F is a continuous distribution function. Notice that excess of 
zeros among ^^'s is matched by choosing the prior 11^ that has a point mass at zero. 
In the empirical Bayes approach to estimation of 9 the hyperparameters p and F of 
the priors Hi are estimated from the data Xi, . . . ,X„. This leads precisely to the 
deconvolution problem for an atomic distribution. 

A related problem is estimation of th e proportion of no n-null effects in large- 
scale multiple testing framework, see e.g. ICai and JinI ( 20101 ). In large-scale multi- 



ple testing one is interested in simultaneous testing of a large number of hypotheses 
Hi, . . . , Hn- Suppose that with every hypothesis Hi there is associated a corre- 
sponding test statistic Xi. A popular framework for large-scale multiple testing is 
the two-group random mixture model, where one assumes that each hypothesis Hi 
has a certain unknown probability tt of being true (the approach is empirical Bayes 
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in its essence) and the test statistics Xi are independent and are generated from a 
mixture of two densities, Xi ^ {1 — 7r)/nuii + 7r/ait- Here tt (the same for all i) is 
called the probability of null effects, /nuii is the null density and /ait is the non-null 
density. Often /nuii is modelled as a density of a normal distribution N{fiQ,aQ), 
while the density /ait is modelled as a Gaussian location-scale mixture 

Mt{x) = / -ipi ]dG{p.,a), 



where (f is the standard normal density and G is the mixing distribution which is 
assumed to be unknown. Observe that tt in this case plays a role similar to 1— p in 
the deconvolution problem for an atomic distribution. Estimation of the probability 
TT and the mixing distribution G based on Xi , . . . , Xn leads to a problem strongly 
related to the deconvolution problem for an atomic distribution. 

After these motivating examples we return to the deconvolution problem for an 
atomic distribution and move to the construction of estimators of p and / (our 
notation is as in the first paragraph of this section). Because of a great similarity 
of our problem to the classical deconvolution problem, one natural approach to 
estimation of p and / is bas e d on t he use of Fourier inversion and kernel smoothing. 



cf. Section 2.2.1 in iMeisteiJ ( 2009f) . In the sequel (j)(, will denote the characteristic 



function of a random variable ^. The Fourier transform of a funct ion q will be 



denot ed by (/jg. Suppose that (j)zit) ^ for all t G R. Following Ivan Es et al. 
(|2008l ). we define an estimator png„ of p as 

5n f^'^" 4'emp{t)4lu{gnt) 



where a number gn > denotes a bandwidth, (j)^ is the Fourier transform of some 
fixed function (a kernel) u chosen beforehand and 4)emp{t) = n-^^ X]?=i ^**"^^ i^ 
the empirical characteristic function. To make the definition of Png^ meaningful, 
we assume that (/)„ has support on [—1,1]. This guarantees integrability of the 
integrand in ^. We also assume that <j)u is real- valued, bounded, symmetric and 
integrates to two. Other conditions on u will be stated in the next section. Notice 
that Png^ is real-valued, because for its complex conjugate we have Pna„ = Pna 



The heuristics behind the definition of p„g„ are the same as in Ivan Es et al.l ()200 



using (j)x(t) = (l)Y{t)4>z{t) and 0y(t) = p -I- (1 - p)(l)f{t), we have 

hm — / — — dt =\im— (l)Y{t)(j)u{9nt)dt 

g^-^o 2 J_i/g„ (pz{t) 9^^o 2 7_l/g„ 



9' 



9n f^^"" 

lim -^ / p(j)u{9nt)dt 

'" ^ J-i/g^ 

1™. ^ / {l-p)(t)f{t)(j)u{gnt)dt 



^0 2 
P, 



-i/g^ 



provided 4>f{t) is integrable. The last equality follows from the dominated conver- 
gence theorem and the fa ct that (ji,, integr ates to two. Notice that this estimator 
coincides with the one in JLee et al.l ( 20101) when u is the sine kernel, i.e. u{x) — 



sin(a;)/(7ra;). The Fourier transform of this kernel is given by 4>u(t) — l[_i ii(i). In 
general Png„ might take on negative values, even though for large n the probability 
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of this event will be small. In any case this is of minor importance, because we 
can always truncate Png„ from below at zero, i.e. we can define an estimator of p 
^^ Pnq — ni^x(0,p„g^). This new estimator of p has risk (quantified by the mean 
square error) not larger than that of p„g„ : 

Remark 1. In order to keep our notation compact, in the sequel instead of writing 
the expectation under the parameter pair (p, /) as Epj[-], we will simply write 

E[-]. ' D 

Next we turn to the construction of an estimator of /. Let 

(2) Png„ = max(-l + e„, min(p„g„ , 1 - e„)), 

where < e„ < 1 and e„ J, at a suitable rate to be specified later on. Notice that 
|Pnff„ I !i 1 ~ Cji- Truncating Png„ from below at — 1 + e„ and not at zero will make 
proofs of the asymptotic results for an estima tor of / somewhat s horter, although 



truncation at zero is still a valid option. As in Ivan Es et al.l (J2008l ). we propose the 
following estimator of /, 

(3) /„.„.„(-) -^ r e--%^^t%4^04/^«Orfi, 

where w is a kernel function with a real- valued and symmetric Fourier transform (j)^ 
supported on [—1,1] and /i„ > is a bandwidth. Notice that fnh„g„ [x) = .fnh„g„ (x) 
and hence fnh„g„ (x) is real- valued. It is clear that Png„ is truncated to Png„ in order 
to control the factor (1 — Png„)~^ in ©■ The definition of fnh„g„ is motivated by 
the fact that 

cf. equation (1.2) in Ivan Es et al.l ( 20081 ). Thus fnh^g^ is obtained by replacing 



X 



and p by their estimators and application of appropriate regularisation determined 
by the ker nel w and bandw idth h. The estimator fnh„gr, essentially coincides with 
the one in iLee et al.l ( 20101 ) when both u and w are taken to be the sine kernels. 



Again, notice that with positive probability fnh„gn{x) might become negative for 
some a; € R, a little drawback often shared by kernel-type density estimators. Some 
correction method can be used to remedy this drawback, for instance one can define 
fnh^„^{x) = vaax.{0, fnh^gnix)), as this does not increase the pointwise risk of the 
estimator. Note that this possible negativity of fnh„g„ cannot be remedied only 
by truncating Png„ from below at zero and then using this new estimator instead 
of Png„ in ^ ■ Observe also that /^^ can be rescaled to integrate to one and 
thus can be turned into a probability density. An alternative correction method to 
turn a poss i bly ne gative density estimator into a probability density is described in 
Glad et al.l (J2003I ). We do not pursue these questions any further. 



In the present work we assume that the distribution of Z is known. In practice 
this is not always the case. If the distribution of Z is totally unknown, then next 
to the sample A"i , . . . , Xn one typically will need some additional data in order to 
construct consistent estimators of / and p. For instance, the case when additional 
measurements on Z, say Zi , . . . , Z,„ , are available in the classi c al de convolution 
problem with a priori known p = is dealt with in Ijohanned ( 20091 ). Further- 



more, one can also consider the case when the distribution of Z is known up to 
a scale parameter. The relevant papers in the classical deconvolution context are 
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Butucea and Matiaa (|2005l ) and iMeisten (|2006l ). Although conceivable in principle, 
extension of our results to these cases is beyond the scope of the present work. 

In the rest of the paper we concentrate on asymptotics of the estimators Png^ 
and fnhnQn ■ In particular, we derive upper bounds on the supremum of the mean 
square error of the estimator p„g„ and the supremum of the mean integrated square 
error of the estimator fnh„g„ taken over an appropriate class of the densities / 
and an appropriate interval for the probability p. Our results complement those in 



van Es et al.l (|2008f ). where the asymptotic normality of the estimators Png^ and 



fnhngn is established. However, the present results are also more general, as we 
consider mo re general error dist ributions, and not necessarily the normal distri- 



(|2008l ) . Weak consistency of the estimators (|T1) and (|3l) 
based on the sine kernel has been established under wide conditions in iLee et al.l 
(12010). Here, however, we also derive convergence rates, much in the spirit of the 
classical deconvolution problems. Notice al so that the fixed parameter asymptotics 
of the estimators of p and / were studied in lLee et al.l (2010J), in particular the rate 
of convergence of their estimator of / (but not of p) was derived. On the other 
hand, we prefer to study asymptotics uniformly in p and /, since fixed parame- 
ter statements are difficult to interpret from t he asymptotic o ptimality point of 
view in nonparametric curve estimation, see e.g. ILow et al.l (jl997i) for a discussion. 
Furthermore, in case of estimation of / we quantify the risk globally in terms of 
the mean inte grated squared error and not pointwise by the mean squared error 



squ 

m 



as done in Lee et al.l (J201Q) . We also derive a lower risk bound for estimation of 
/, which shows that our estimator is rate-optimal over an appropriate functional 
class. Our final results are lower bounds for estimation of p. These lower bounds 
entail rate-optimality of Png„ in a large class of examples. 

The structure of the paper can be outlined as follows: in Section [2] we state the 
main results of the paper. The proofs of these results are given in Section |3l while 
the Appendix contains several technical lemmas used in Section |3l 



2. Results 

The classical deconvolution problems are usually divided into two groups, or- 
dinary smooth de convolution problems and supersmooth deconvolution problems, 
see e.g. iFanI ( 1991 ) or p. 35 in lMeisteiJ ( 20091 ). In the former case it is assumed that 
the characteristic function (j)z of a random variable Z decays to zero algebraically 
at plus and minus infinity (an example of such a Z is a random variable with 
Laplace distribution), while in the latter case the decay is essentially exponential 
(for instance, Z can be a normally distributed random variable). The rate of decay 
of (j)z Sit infinity determines smoothness of the density of Z and hence the names 
ordinary smooth and supersmooth. Here too we will adopt the distinction between 
ordinary smooth and supersmooth deconvolution problems. The ordinary smooth 
deconvolution problems for an atomic distribution will be defined by the following 
condition on (j)z. 

Condition 1. Let (j)z{t) ^ for all t eR and let 

(4) dol^r^ < |0z(i)l as |t| ^ (X), 

where do and j3 are some strictly positive constants. Furthermore, let 4>z be inte- 
grable. 
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Remark 2. Note that the assumption of mtegrabihty of (j)z puts certain restriction 
on the tail behaviour of (j)z and therefore imphcitly on /3 too. In particular, in 
order that Condition [T] does not lead to an empty assumption, we must have /3 > 1. 
Notice that a lower bound on the rate of decay of (f>z as in ((4]) is needed in order 
to der ive upper ris k bounds for th e estimators png^ and fnh^g„, cf. p. 1260 in iFan 
( 19911 ) and p. 35 in iMeisten (;2009j. When deriving lower bounds for estimation of 
p and /, (III) has to be further refined by adding an explicit upper bound on the 
rate of decay of (j)z , see below. D 

For the supersmooth deconvolution problems for an atomic distribution we will 
need the following condition on cjiz- 

Condition 2. Let 4>z{t) ^ for allteM. and let 

(5) do|i|^°e-l*l'/^<|0zW|as|t|^oo, 

where j3q is some real constant and do, /3 and 7 are some strictly positive constants. 
Furthermore, let (jyz he integrable. 

Next we need to impose conditions on the class of target densities /. 

Condition 3. Define the class of target densities f as 

(6) E(a,i^s) = (/: /" |0/(i)P(l + |tp")dt < ifj, 

Here a and Ks are some strictly positive numbers. 

Smoothness condi tions of thi s type are typical in nonparamet ric curve estimation 
problems, cf. p. 25 in lTsvbakovl (2009J) or p. 34 in lMeisteiJ ( 20091 ). Some smoothness 
assumptions have to be imposed on the class of target densities, because e.g. the 
class of all continuous densities is usually too large to be handled when dealing with 
uniform asymptotics. A possibility, different from Condition [3l is to assume that / 
belongs to the class of supersmooth densities 



I](a,7,i^s)- / 



|0/(i)pexp(27|ir)dt<Ks 

for some strictly positive a, 7 and K-^. The class Yi{a^j,K^) is much smaller than 
the class S(q;, K^) and the estimators p„g„ and fng„h„ will enjoy better convergence 
rates in this case than i n the ca se wh en the class of target densities is Il{a,Kj:), cf. 
Butucea and Tsvbakovl (J2008al ) and iButucea and Tsvbakovl (|2008bl ) for a similar 
result in the classical deconvolution problem. In order not to overstretch the length 
of the paper, we decided however not to cover this case in the present work. 

Remark 3. In the sequel we will use the symbols < and > to compare two sequences 
a„ and 6„ indexed by n, meaning respectively that a„ is less or equal than &„ for all 
n, or greater or equal, up to a universal constant that does not depend on n. D 

The following theorem deals with asymptotics of the estimator Png„ ■ Its proof, 
as well as the proofs of all other results of the paper, is given in Section [31 

Theorem 1. Let a function u be such that its Fourier transform 0„ is symmetric, 
real-valued, continuous in some neighbourhood of zero and is supported on [—1, 1]. 
Furthermore, let 

f-i 



(7) 



Mt)dt^2, 






< U for all t e 
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where the constant a is the same as in Condition \3i U is a strictly positive con- 
stant and for t — the ratio (f)u(t)t^°' is defined by continuity at zero as the limit 
linit_>.o (/'jj(t)f~", which we assume to exist. Then 

(i) under ConditionUl by selecting gn = dn^^'^^"'^^^' for some constant d > 0, 
we have 

(8) sup E[(p„,„-p)2]<n-(2"+i)/(2"+2«. 
/GS(Q,ifs),pe[o,i) 

(ii) under Condition\^ by selecting gn ~ (4/7)^''^(logn)^"'^''', we have 

(9) sup E[(p„,„-p)2]<(logn)-(2"+i)/^. 
/es(a,KE),pe[o,i) 

Thus the rate of convergence of the estimator Png„ is slower than the root-n rate 
for estimation of a finite-dimensional parameter in regular parametric models. For 
Theorem [l](ii) this is evident, while for Theorem[l](i) this follows from Remark[2l 
which entails the fact that 2a-t-l < 2a + 2/3. However, see Theorems H] and [5] below, 
where for a practically important case of a normally distributed Z, as well as Z with 
ordinary smooth distribution, by establishing the lower bounds for estimation of p 
we show that the slow convergence rate is intrinsic to the deconvolution problem 
and is not a quirk of our particular estimator. 

Remark 4. The function u in the statement of Theorem [1] will not be a probability 
density, not even a function that integrates to one, and hence by calling it a kernel 
we somewhat abuse the established terminology in kernel estimation. Notice that 
condition ^ and the assumption a > in Condition p preclude the kernel u from 
being the sine kernel. We refer to Ivan Es et al.l ( 2008[) for one particular example 



of u that produced good results in simulations. Its Fourier transform is given by 

^^(t)^^^f{l-t^)\_,,,^(t). 

Here a = 6 an d U = 693/8. An ex plicit, but rather complicated expression for u 
can be found in Ivan Es et al.l ( 2008 ). D 



Next we will study the asymptotic behaviour of the estimator fnh„g„ of /• We 
select the mean integrated square error as a criterion of its performance. 

Due to technical reasons, see the proof of Theorem |2j in the ordinary smooth 
case it is convenient to split the sample Xi , . . . , Xn into two parts and next to base 
the estimator Png„ on the first part of the sample only, i.e. on Xi, . . . , X[„/2j , and 
to redefine fnh,^g„ as 

(10) /„.„.„(-) -^ r e-^'^ff^^4^^^Mhnt)dt, 

where 

1 " . 

(Pemp(t) = j 7777 / e 

n — \n/2\ ^-^ 

Thus (j)emp is based on the second half of the sample Xi, . . . , Xn only. Note that 
^[ipempit)] = El [(jjempit)] = (t>x(t). From now on we will assume that Png„ and 
fnh„g„ are defined in this way in the ordinary smooth case, but will retain the 
old definition in the supersmooth case. Splitting the sample does not affect the 
convergence rate of /n/i„g„ in the ordinary smooth case, but only the constant 
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factor in the upper bound on its mean integrated squared error. The general case 
without sample splitting in principle can also be handled, but we anticipate longer 
and more technical proofs, cf. the remarks at the end of the proof of Theorem [21 
Since in the present work we are only concerned with convergence rates, sample 
splitting does not lead to a significant loss of generality. 
The following theorem holds. 

Theorem 2. Let a kernel u satisfy the assumptions in Theorem]^ Furthermore, let 
a kernel w be such that its Fourier transform is symmetric, real-valued, is supported 
on [—1, 1] and 

(11) (/.^(0) = 1, \(b^it)~l\<W\t\''forallteR, f \(by,it)fdt < oo, 

where W is some strictly positive constant. Moreover, let p € [0,p*], where p* < 1. 
Then 

(i) under Condition\Tl by selecting hn = d{n — \n/2\)^^'^'^"^'^"'^^' for some 
d>0, gn = d[n/2j^i/(2"+2/9) and e„ = (log3n)-i, we have 



(12) sup ] 

feT,{a,Ks),pelO,p'] 



ifnh„g„ix) ~ f{x)fdx 



< -2a/(2a+2/3+l) 



where /n/i„g„ is defined by (jlOp . 

(ii) under Condition\^ by selecting ft,„ ~ gn = (4/7)^' '^ (log n)""'^'^ and e„ = 
(logSn)^"'^, we have 



(13) sup E 

/eE(Q,Ks),pe[o,p*] 

where fnhngn ^^ defined by ([3]). 



{fnhr,g„(x) - f{x)fdx 



< 



(logn) 



-2a/P 



Remark 5. As it will become clear from the proof of this theorem, without the 
assumption p* < 1 one cannot study the asymptotics of fnh„g„ uniformly in (p, /) 
for p G [0,p*] and / G S(a, iiTs). Since p* is allowed to be arbitrarily close to 1, 
from a practical point of view p* < 1 is not an important restriction. Observe that 
one can also study the case when p* ~ p* depends on the sample size n and p* — > 1 
at a suitable rate. D 

Remark 6. The condition /i„ = gn in Theorem [2] (ii) is imposed for si mplicity of the 



proofs only. In practice the two bandwidths need not be the same, cf. Ivan Es et al 



(|2008[ ). where unequal hn and gn are used in simulation examples. Also notice that 
our conditions on hn and g„ in Theorems [1] and O are of asymptotic nature. For 
practical sugges tions on bandwid th selection for the case when both u and w are 
sine kernels, see iLee et al.l (2010|), where also a number of simulation examples is 

given. D 



Remark 7. We refer to Ivan Es et al.l (|2008l ) for one particular example of a kernel 
w. Any kernel that is known to produce good results in the classical deconvolution 
problem can be used as a kernel w. A relevant paper on the choice of a kerne l in 
the context of the classical deconvolution problems is iDelaigle and Halll ( 20061 ) . to 
which we refer for a discussion and more examples. D 

The upper risk bounds derived in Theorem|2]coincide with the upper risk bounds 
for kernel-type estimators in the classical deconvolution problems, i.e. in the case 
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when p is a priori known to be zero, see Theorem 2.9 in lMeisteii ( 20091 ) . Naturally, 



a discussion on the optimality of convergence rates of the estimators fnh„g„ and 
Pngn is in order. Let /„ denote an arbitrary estimator of / based on a sample 
Xi, . . . , Xn ■ Consider 



Xi = inf sup E 

/„ /es.pe[o,p*] 



{Ux)~!{x)fdx 



i.e. the minimax risk for estimation of / over some functional class E and the interva l 
[0, p*] for p that is associated with our statistical model, cf. p. 78 in lTsvbakovl ( 20091 ) . 
Notice that 






{Ux)-J{x)fdx 



> inf sup E 

U /es,p=o 

The quantity on the right-hand side coincides with the minimax risk for estimation 
of a density / in the classical deconvolution problem, i.e. when p = and the 



rando m variable Y has a density /. Using this fact, by Theorem 2.14 of iMeister 
(J2009I ) it is easy to obtain lower bounds for 31* , but first we need to formulate two 
addition conditions on the rate of decay of (f)z at plus and minus infinity. These 
two conditions correspond to the ordinary smooth and supersmooth deconvolution 
problems, cf. Conditions [T] and [21 



di 



ItP 



for alH g 



Condition 4. Let (f>z be such that 

for some strictly positive constants di and j3. 

Condition 5. Let (pz he such that 

\(t>z{t)\<die-'^'\'l'^, \(t>'z{t)\<die-\^'^^''' for alH G M 

for some strictly positive constants di, (3 and 7. 

The following result holds. 

Theorem 3. Let fn denote any estimator of f based on a sample Xi, 
let a > 1/2. Suppose that K^: is large enough. Then 
(i) under Condition [7] we have 



, Xr, and 



{fix) - f{x)fdx 
(a) under Condition the inequality 



(14) inf sup E 

/„ feJ:{a,Ks),pe[o..p'] 



> -2a/i2a+20 + l). 



(15) 



holds. 



inf sup E 

/„ /eE(a,_R'j:),pe[0,p*] 



(fix) - f{x)fdx 



> 



(log n) 



-2a//3 



These lower bounds are of the same order as upper bounds in Theorem [51 It 
then follows that our estimator of / is rate-optimal under the combined conditions 
in Theo r ems [^ and [H For a discussion on the conditions in Theorem 3 see p. 35 in 
Meisteij (|2009l ). 

Derivation of the lower risk bounds for estimation of probability p appears to 
be more involved. We will establish the lower bound for the case when Z follows 
the standard normal distribution. This is an important case, as the assumption of 
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normality of measurement errors is frequently imposed in practice. The following 
result holds true. 

Theorem 4. Let Z have the standard normal distribution and let pn denote any 
estimator of p based on a sample Xi, . . . ,X„. Then 



(16) 



holds 



inf sup E [(p„-p)2] > (logn)-("+i/2) 

P" /es(a,K5;),pe[o,i) 



A consequence of this theorem and ^ is that our estimator Png„ is rate-optimal 
in the case when Z follows the normal distribution. 

The arguments used in the proof of Theorem|3]can be easily extended to the case 
when the distribution of Z is ordinary smooth. Below we provide the corresponding 
statement in the ordinary smooth case. 

Theorem 5. Let the characteristic function of Z satisfy ConditionV^for j3 > 1/2. 
Let Pn denote any estimator of p based on the sample Xi, . . . , X„. Then 
inf sup E [(p„-p)2] >n-(2"+i)/(2a+2« 

P'. /eS(a,Ks),pG[Oa) 

holds. 

This theorem and Theorem [1] (i) imply that under the combined conditions in 
Theorems [T] (i) and [5] the estimator Prig„ is rate-optimal. 



3. Proofs 



Proof of Theorem[l[ The proof uses some arguments from iFanI (jl99ll ). To make 
the notation less cumbersome, let supy j, = swpj:^-^(a,Ks),p<=[o.i) ■ We first prove (i). 
We have 



(17) 
Observe that 

IE [Png. 

(18) 



supE [(p„g„ - p)^] < sup(E [pngj - p)^ + sup Var [Pngj- 
f,p f,p f,p 



^f[ — ] <i^u{t)dt 

-1 \9'' 



1 
< - 

- 2 



4' 



{Sj)\M 



[t#0] 



dt 



dt\ 



Mt) 



[t/O] 



dt 



1 



where we used ([7]), ^ and the Cauchy-Schwarz inequality. Therefore 



(19) 



SUP(E [P„g„ 

f,p 



-V? 



< 



holds. Furthermore, using independence of the random variables X^^s, 



Var [pn 



1^ 
4 n 



•Var 



1/9,. 



-i/gn 






dt 
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< -^E 

4 n 



igl 

4 n 



-1/s,. 'I'zit) 



dt 



/-l/9„ '^^(^) 

where q is the density of Xi . Notice that 

^ i-oo 1 /"^ 

q{x) = — / e~^'^cj,y{t)Mt)dt < — / |<? 

where we used integrabihty of (pz- Therefore 



dt q(x)dx, 



n i-oo V^ 


-1/9. 


Mt) 


±5i r 

27r n J_^ 


4>z{-t) 


2 

dt 


1 .9n /•' 
2tt n y_i 


(l^zi-t/gn) 


2 
dt 



',{i)\dt < oo, 



dt dj 



by Parseval's identity. This inequahty and an argument as on p. 1266 of lFanI (|l99ir ) 
entail that 



(20) 



sup Var [png„ 



< 



1 



2/3-1 ■ 

ngn 



Formula 1^ is then a consequence of p7)) . p^ . ([20)1 and our specific choice of (?„ 

m(i). 

Now we prove (ii). Since the first term on the right-hand side of (J17p can be 
treated as in the ordinary smooth case (in particular (TTOl) holds), we concentrate 
on the second term. Using independence of the random variables Xi's, 

r-l 



Var [p„ 



(21) 



< 



11, 
in 

1 1 

In 



-Var 



g»tX,/s„^nW ^^ 



Mt) 



(t)zit/gn 



<f>zit/gn 

2 
dt ' 



By the same arguments as on pp. 1265-1266 of lFanI ( 1991 ). one can show that 



(22) 



Mt) 



4>z{t/gn) 



dt < 



'c'ei/^-^S"), if/3o>0 

C'<?^«ei/(79^), if/3o<0, 



where the constant C docs not depend on n. In either case, because of our choice 
of gn, the righthand side of ((22|) is of order o{n^^^). This and ([21]) imply that 



supVar[p„g„ 
f,p 



o{n 



^l/3\ 



The latter together with (|17l) . ([T^ and our choice of 5„ in (ii) proves ©. 



D 



Proof of Theorem\^ We use the shorthand notation supj^ = sup f ^^r^j^^ypfziQ^p-] ■ 
By Fubini's theorem and the standard squared bias plus variance decomposition we 
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have 



supE 



{fnh„g„ix) - f{x)fdx 



<sup/ {E[f„h„g^{x)]-f{x))^dx 

f.P J -OO 

/oc 
-OO 



= Ti+T2. 
Keeping in mind the remarks surrounding (jlOp . let 



/n/i„ (a;) = — 
Ztt 

in the ordinary smooth case, while 

1 

2^ 

in the supersmooth case. Introduce 



-itx 9^emp\t)(Pw(lT-nt) J, 



KO 



fnh„ (x) 



^-^tx 4'emp{t)(i3w{Kt) ^^ 



'Xt) 



(23) 



„ , . fnh„{x) p . , 

Jnh„[X) = — Wh^^{x), 



1 — P 1 ^ P 

where Wh,^{x) = {\ / hn)w{x / hn) . We first study Ti, i.e. the supremum of the inte- 
grated squared bias. By the C2-inequality it can be bounded as 

/OO 
{nfnh,Ax)]-I{x)fdx 
-OO 
/OO 
(IE [/n/i„ff„ {x) - fnh„ {x)])'^dx 
-OO 

By Parseval's identity and the dominated convergence theorem 

{E[U^{x)]-f{x)fdx=— / \cbf{t)f\(by,{h„t)~l\^dt 

2,71 . _^, 



ef / itnhm'^^'"^'"'''^ '"' 



1 



\hnt 



2q 



^t^o]dt 



<h: 



2a 



Here in the second equality we used the fact that 0^,(0) = 1. The dominated 
convergence theorem is applicable because of Condition[3]and ([TT|). Hence T3 < /i^" 
in view of the fact that / G Y,{a,KY}). It is also straightforward to see that in fact 
sup^pTa < h'^. We deal with T4. By the C2-inequality 

n\ 2 



r 

J — c 



(^Ifnh^gAx) - fnhAx)])^dx < E 



Png^ - P 



{l-Pna,Xl-p) 



{whAx))"^ dx 



iPng„ - P) 
(1 -Pns„)(l-P) 



Notice that 



E InhSx) 

{wh„{x)) dx ~ — / {w{x)) dx < 00, 

3 itn J— 00 



dx 



DECONVOLUTION FOR AN ATOMIC DISTRIBUTION 



13 



because by our assumptions and Parseval's identity w is square integrable. We first 
consider T5. By the Cauchy-Schwarz inequality we have 

(Png,, - P? 



n< 



1 



{w{u)yduE 



(l-PnaJHl-pY 



With our choice of the smoothing parameters /i„ and gn it follows from Lemma [5] of 
the Appendix that sup / Ts < g^". Now let us turn to Tg. By the Cauchy-Schwarz 
inequality 

iPng„ - Pf 



n< 



^[{fnhjx)r]dx. 



xi-Png„ra-p)\ 

By Lemma [2] of the Appendix the first term in the product in the above display 
is of order g^"'^^. The same holds true for its supremum over / and p. Hence it 
remains to study the second factor in the above upper bound on Tg. Wc have 



E[{U^{x)f]dx^ Var [/„;,„ (x)]rfa; + / {E[U^{x)]fdx 



Let the function Wn is defined by 

ZTT J _i 

Notice that by independence of Xi 's 



e-'*^^M^di. 



4>zit/hn) 



T7 = 



1 



'^"'n J -00 



Var 



Wn 



x-Xi 



in the supersmooth case, and 

1 



T7< 



ln/2\)hl 



dx < 



E 



ihl 



Wn 



E 



Wn 



X- Xi 

K 



x-Xi 

K 



dx 



dx 



in the ordinary smooth case. Then by Fubini's theorem 

2 



00 /'OO 



1 

nhl 

1 
nh 

1 



00 J — CX3 
00 /'OO 



T7<-^ I I (Wn{^]] q{s)dsdx 



'OO <J —00 

00 /"OO 



Wn 



X - 



dxq{s)ds 



{Wn{x)fdxq{s)ds 



n J —oc J —00 
1 



\Mt)? 



nhn y„i |0z(i//lrOP 

in the supersmooth case, and 

1 



dt 



T7< 



(n- ln/2\)K 



\Mtlhn)\ 



:dt 



in the ordinary smooth case. Here we used the fact that q, being a probability 
density, integrates to one, as well as Parseval's identity. The integrals in the last 
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equalities of the above two display ed forniula. e can be analysed by exactly the same 
arguments as on pp. 1265-1266 in iFanI ( 1991 ). Thus 

— TOTT, if Z is ordinary smooth, 

^e^/hhi)^ if Z is supersmooth and /3o > 0, 
hi^e^/hhi)^ if Z is supersmooth and /3o < 0. 

The same order bounds hold for sup^^ T7 as well. As a consequence, supj ^ T7 — >• 0. 
Let us now study Tg. By Parseval's identity and the fact that |(/)Y(i)| < 1, we have 




1 /■"" 

TT- / \(t>Y{t)Mhnt)\^M-h-\k-^]{t)dt 



1 1 ^^ 



<!. 

r^ 7 1 

h„ 
where the last line follows from our assumptions on w. It follows that sup <■ Tg < 
1/hn- Combination of the above bounds on sup^j T7 and sup^j Tg entails that 
sup^pTg < 5^", where we also used the fact that g„ < /i„. Therefore T4, as well 
as Ti, i.e. the supremum of the integrated squared bias, is of order /i^". For the 
ordinary smooth case this gives an upper bound of order 77,-20/(20+2/3+1) ^^^ j^^^ 
while for the supersmooth case an upper bound of order (logn)"^"/^. 

Now we turn to T2, i.e. the supremum of the integrated variance. We have 

/oo 
Var [fnh^g^{x) - fnh^x) + /„/,.,^ (x)]dx 
-00 

/OC /'OO 

Var [fnhr, {x)\dx + / Var [/„/i„g„ {x) - /„/>„ {x)]dx 
-00 J —CO 

— Tq + Tio, 
where we used the fact that for random variables ^ and r/ 

Var K + f?] < 2 (Var [^] + Var [??]). 

Since Tg up to a constant is the same as T7, cf. ()23p . the term sup^ ^Tg can be 
bounded as before, see ([24| . We consider Tip. Let V'n be as in (|34l) in the proof of 
Lemma [2] of the Appendix. Then 

/oo 

= Til + ri2. 

By the C2-inequality 

1 f°° 
Til < — / (u;(a;))2dxE 

'^n J — 00 



(l-p„gJ2(l-p)2MIP.«,.-Pl>V'„] 
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ifnhAx)y 



(Png^ - PY 



:h 



i~p\>ip,i] 



' — oo 



dx 



{l-Pn,y{l~pY- 

Since T13 < /i^^e^^ suppj P(|p„g,^ —p\> i^n), which follows from the fact that 



(Png„ -pf 



{1-PngJHl-py 



< 



2(1 



2p 



*2 



4(1 -p*)^ 



by Lemma |3] of the Appendix with our conditions on hn and e„ it certainly holds 
true that sup^ j Tia < h"^. As far as T14 is concerned, by Fubini's theorem and 
Parseval's identity 



Ti 



14 



E 



E 



[Png„ Pi -[ I r i r \\2 7 



(P«g„ - Pf 



< 



1 1 






2 -"-[IP'-su-pI >'/'».] 



27r 



\4'empit)(t)w{hnt)\' 



\Mtw 



dt 



Hence 



\Mt/hnW 
1 



dtP{\png,^ -p\> iJm)- 



Ti4< 



g2 ,2/3+ 



tP(IP"S„ -P\> ^n) 



in the ordinary smooth case, and 



ri4< 



^ 1^2/(7/0 p(|p„^^_p|>^„), 



^ h 



if /3o > 0, 



1 
72-' 



^ .^2,3„-lg2/(7hi^) p(|^^^^ „ p| > ^^)^ if /3o < 

in the supcrsmooth case, cf. pp. 1265-1266 of lFanI ( 19911 ). Similar order bounds are 
true for sup„ rTn. Again by Lemma [3] and our conditions on /i„ and e„, we have 

SUPpjTi4</l2". 

To complete establishing an upper bound on Tiq, it remains to study Ti2. As in 
the case of Tn, by the C2-inequality 



ri2<^ 



{w{x)fdxE 



ifnhA^)? 



(P»3„ -pY . 

iPng„ -P? , 



, -p|<i/<„] 



dx 



(l-j5„,J2(l-p)2 1 

holds. By Lemma [T] of the Appendix the first term on the righthand side is up to a 
constant bounded by (l/ft,„)(7,*;"~'"^ and hence is of order g^-^ . The same is true for its 
supremum over p and /. As far as the second term is concerned, in the supersmooth 
case it is bounded by i/'^ /^ E [(/„h„ {x)Y]dx. It follows from the upper bounds on 
suppj T'j and sup^j Tg that in the supersmooth case we have suppj T12 < /i^". As 
far as the ordinary smooth case is concerned. 



E 



UnhSx)? 



iPng„ -P? , 

{l-Pngy{l^P? f''^"^" 



-p\<i>„ 



dx 



E 



{fnhAx)f 



dx\ 



{Png„-pf 
(I-Pn.j^(l-P)' "^"■'""''-'^"' 
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holds. This is precisely the place where we use independence between fnh{x) and 
p„g^ implied by sample splitting, cf. the remarks around (|10|) . Then in this case 
too sup fTi2 < h"^. Had not we used the sample splitting trick, in the above 
display we would have to apply the Cauchy-Schwarz inequality apparently leading 
to rather lengthy computations. 

Combination of the bounds on sup^ j Tn and sup^ j T12 implies that sup^ ^ Tio ^ 
h"^. The bounds on and supj^pTg and supj^Tio induce the bound on T2- The 
statement of the theorem then follows from the bounds on Ti and T2. D 

P roof of Theorem [3 The result is a straightforward consequence of Theorem 2.14 
of lMeisted (l2009l ). D 



Proof of Theorem^ A general idea of the proof can be outlined as follows: we will 
consider two pairs (pi, /i) and (p2, /2) (depending on n) of the parameter (p, /) that 
parametrises the density of X, such that the probabilities pi and p2 are separated 
as much as possible, while at the same time the corresponding product densities 
gf " and g®" of observations Xi , . . . , X„ are close in the x^-divergence and hence 
cannot be distinguished well u sing the observations Xi,. . . ,X„. By Lemma 8 of 
Butucea and Tsvbakovl ( 2008a ) the squared distance between pi and p2 will then 



give (up to a constant that does not depend on n) the desired lower bound (fT6|) for 
estimation of p. 

Our construction of the two alternatives (pi,/i) and (P2i.f 2) is partially mot i- 
vated by the construction used in the proof of Theorem 3.5 of IChen et al.l ( 20101 ). 



Let Ai = A + (5"+^/^, where A > is a fixed constant and (5 | as n — >■ 00. 
Define pi = e^^^ and notice that pi G [0,1). Next set 4>gi{t) = e"'*' and ob- 
serve that this is the characteristic function corresponding to the Cauchy density 
gi{x)^l/[-K[l + x^)). Finally, define 



'^/.W = p^(^^^'-'*'-i) 



Denote by Wj the i.i.d. random variables that have the common density gi and by 
A^Ai the random variable that has Poisson distribution with parameter Ai. Then 
the function 0/^ will be the characteristic function corresponding to the density /i 

of the Poisson sum Y = ^a^I Wj of i.i.d. Wj's c onditional on the fact that the 



number of its summands N^^ > 0, see pp. 14-15 of lGugushvilil (|2008l ). Notice that 
we have an inequality 

\^fAt)\<^^\Kit)l 

cf. inequality (2.10) on p. 22 of iGugushvilil (|2008[ ). Keeping this inequality in 



mind, without loss of generality we can assume that K^ is already such that 0/^ G 
E(a,ii's/4). Otherwise we can always consider 0gi (i) = e~" I'' with a fixed and 
large enough constant a' > 0, so that (pf-^ e 'S{a,Kj:/4:). It is not difficult to see 
that the fact that a' ^ 1 will not affect seriously our subsequent argumentation 
in this proof. Next define the density gi corresponding to the pair (pi,/i) via its 
characteristic function 

and remark that it has the convolution structure required for our problem. 

Now we proceed to the definition of the second alternative (p2,/2). Set A2 = A 
and p2 = e^^'^. The fact that p2 G [0,1) follows from the fact that A > 0. Let 
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H he a function, such that its Fourier transform (pH is symmetric and real-valued 
with support on [—2, 2], 4>H{t) = 1 for t G [—1, 1] and 4>h is two times continuously 
difFerentiable. Such a function can be constructed e.g. in the same way as a flat-top 
kernel in Section 3 of iMcMurrv and Politis (2004) . Define 

where the perturbation function r is given by 

A"+l/2 

r{t) = ——{^fAt)-l)<f>Hm- 
M 

We claim that for all n large enough i/ig^ is a characteristic function, i.e. its in- 
verse Fourier transform 92 is a probability density. This involves showing that 172 
integrates to one and is nonnegative. The former easily follows from the fact that 

/CXD 
g2{x)dx ^ cl>gM ^ KiO) ^ I, 
-00 

since t(0) =0 by construction and t/fg^ is a characteristic function. As far as the 
latter is concerned, we argue as follows: observe that g2 is real- valued, because (fig^ 
is symmetric and real- valued. By the Fourier inversion argument 

sup \g2{x) - gi{x)\ <^ f \T{t)\dt ^ 

as n -T> CXI, by definition of t and because (5 — > O. Since 171, being the Cauchy density, 
is strictly positive on the whole real line, provided n is large enough it follows that 

(26) 52(2;) > 0, xeB, 

where i3 is a certain neighbourhood around zero. Next, we need to consider those 
a:'s, that lie outside this certain fixed neighbourhood of zero. We have 

I r°° f ^"+1/2 \ 

giix) = — / e-**- U,,(i) -H ^^(0g, W - \)<t>H{^t) ) dt 



00 \ '*2 

2^X 

^"+1/2 \ ^"+1/2 X 



// A"+l/2\ A"+l/2 Aa + 1/2 \ 



= ( 1 + "-^—\ 51 (X) + ^^^^^ / e-^'-<t>gAt){^Hm - \)dt 



x« + l/2 1 j-oo 

e-'*''(j)H{St)dt 



A2 2tt 
= Ti{x)+T2{x)+T3{x). 

Both T2{x) and T3{x) are real- valued by symmetry of (j)g-^ and (fin and the fact that 
these Fourier transforms are real- valued. Consequently, 172 itself is also real- valued. 
Since gi is the Cauchy density and S > 0, the inequality 

(27) Ti(x) > --^ 

holds for all a; G K. Assuming that x ^ and integrating by parts, we get 

1 A"+i/2 1 /• 
T2{x) = -. ^—- / d^gAmHiSt) - l)de~^'^ 

IX A2 iTT Jm\[-5-1,5-1] 
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1 A"+l/2 1 r 

= -■ ^—TT / e-*-[0,,(t)(0«(^t) - l)]'dt. 

Applying integration by parts to the last equality one more time, we obtain that 
1 ,5°+i/2 1 



x^ M ^TT Jk\[^s-\s-^ 
which implies that 

\T2{x)\ < -^C8^^^l^ f I [</.,, (t)(<^ff(^i) - l)]"\dt, 

where the constant C does not depend on x and n. Since 6-^0 and the first and 
the second derivatives of (f>ji are bounded on R, it follows that 



\T2{x)\ < ^C'<5"+i/2 / e-*c 



'dt, 
where the constant C" is independent of n and x. In particular, 

(28) \T2{x)\ < C'S^^+'Z^^ 

x'^ 

for all n large enough. Finally, using integration by parts twice, one can also show 
that for a; 7^ 

1 Aa+5/2 1 /-oo 



A2 2nj_^ 
and hence 

(29) |T3(a;)|<C"r+3/2l, 

where the constant C" does not depend on n and x. Therefore, by gathering ([27])- 
(P5)) . we conclude for all n large enough and all a; e M the inequality 

g2{x) = Ti{x) + T2{x) + nix) > 

is valid. Combining this with (j25p . we obtain that 172 is a probability density. 

Now we turn to the model def ined by the pair (p2, /2)- Again by the argument 
on pp. 22-23 of lGugushvill (|2008l ). 

Notice that by selecting a' in the definition of 0gj (t) — e~" I*' large enough and A 
large enough, one can arrange that /2 G S(a,irs), at least for all n large enough. 
Without loss of generality we take a' = 1. Set 

092(0 = (P2 + (1 -P2)0g.(t))e"*'^'- 

This has the convolution structure as needed in our problem. Hence both pairs 
(pij/i) s-iid (^27/2) belong to the class required in the statement of the theorem 
and generate the required models. 
It is easy to see that 

(30) \P2-Pi\-S'^+^/^ 

as (S — )■ 0, where x means that two sequences are asymptotical ly of the same order. 
Consequently, by Lemma 8 of iButucea and Tsvbakovl (J2008bh the lower bound in 
(fTB]) will be of order 5^"+^, provided we can prove that nx^{q2, gi ) — )■ as n — )■ 00 for 
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an appropriate (5 — > 0. Here x^(92,<Zi) is the x^ divergence between the probabiUty 
measures with densities (72 and qi, i.e. 

J -00 91 W 

see p. 86 in lTsvbakovl (|2009l) . 
Notice that we have 

qi{x) = e-^V(x) + (1 - e-^i)/i * ip{x), 

where ip denotes the standard normal density. Let Si denote the first element of 
the sequence S — Sn iO. Then 



/i(^) = E 9r{^)P{Nx, = n\N^, > 0) 

n=l 

> gi{x)P{Nx, = l\Nx, >0) 
PiNx, = 1) 



= 5i(a:^) 



1 - P(iVA, = 0) 

+1/2 






cf. p. 23 in lGugushviiH (|2008[ ). It follows that for all x 

(31) <zi(x) > (1 - e-^O/i * ^i^) > KAXe~^-^°*'^\i{\x\ + A) ^ cxgi{\x\ + A) 



A 



for some large enough (but fixed) constant A > 0. Here the constant ka = J_a k{t)dt 
The inequalities in (PT|) hold, because 

/•OO 

(1 - e-^i)/i * <f{x) = (1 - e-^i) / /i(a; - tMt)dt 



_^__5„ + i/2 



>Xe^^^^"^"^ j gi{x - t)if{t)dt 

-00 

> Ae^^^'^i / gi{x - t)ip{t)dt 

J-A 

>gi{\x\+A)Xe-^-^^^''\A 

by positivity of gi and fc and the fact that the Cauchy density is symmetric at zero 
and is decreasing on [0,oo). 

Now we will use ([31]) to bound the x^-divergence between the densities 52 and 
qi. Write 

X (92, gi) = / ^^ dx 

J -00 91 (a^) 

^ {q2ix) - <ll{x))\^ ^ f {q2{x)-qi{x)f ^^ 



-A qi{x) Jk\[-a,a] 91(2^) 

= 5i + 52. 

Using (|3T]) . for ^i we have 

5*1 < — — 1—- I {q2{x) - qi{x)f dx = cx.gi I {q2{x) - qi{x)f dx , 

CX-aA\x\<A9l{x) J-oa J-00 
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where CA,gi > is a constant. By Parseval's identity the asymptotic behaviour of 
the integral on the righthand side of the last equality can be studied as follows, 



{q2{x) ~qi{x)fdx = — 

_ 1 

~ 2^ 
_ 1 

"^ 2^ 



\(l>aAt)-K^t)?dt 

gA2(<^g2(t)-l) _gAl(0g,(t)-l) 



-5-1,5-1] 



-5-i,<5-i] 



dt 



-t^CQ+l/2/ 



\5^ 



,{t)-l)\^\l-<l>H{5t)\^dt. 



Using this fact and boundedness of (j)^ on the whole real line, we get that 

poo 

feCx) - qi{x)fdx < <52"+i / e-'"dt < <52"+2e-l/^^ 

3 Jl/S 

Thus by taking 6 — C5(logn)^^'^ with a constant < c^ < 1 we can ensure 
that the righthand side of the above display is o{n^^) and consequently also that 
Si=o{n-^). 

Next we deal with 5*2. By (pij) and Parseval's identity we have that 

qiix) > JT—. JTIT. 

^ ^ - TT l + {\x\ +Ay 
Therefore by Parseval's identity 

S2< [ |[09.W-0g:W]frf^+ / !</.,, (i) -</.,, (t)pdt. 

■/K\[-<5-i,<5-i] JR\[-5-i,<5-i] 

Exactly by the same type of an argument as for Si , after some laborious but easy 
computations, one can show that £"2 — o{n^^), provided 6 x (logn)^^" with a 
small enough constant. Consequently, with such a choice of S, we have n\^iq2,qi) — > 
as n — >■ 00 and the theorem follows from Lemma 8 of iButucea and Tsvbakov 
(l2008bl) and ^. D 



Proof of Theorem\^. We use the same alternatives (pi,/i) and {p2, f^) as in the 
proof of Theorem |4l One needs to show that the x^-divergence between the cor- 
responding probability densities qi and 92 is of order 0{n~^). The arguments used 
in the proof of Theorem |4] go through and for that end it suffices to show that 



(32) 

and that 
(33) 



\K{t)-K{t)?dt^O{n-^) 



\{K{i)-K{t))'?dt^O{n-'). 



Observe that for these two integrals to be finite, we need that /3 > 1/2, cf. the 
argument below. We have 

\4>,M-4>qAt)?dt= [ \^z{t)f e^^^*'^ (*)-!) -e^^^^'i (*)-!) 'di 

jR\[-<5-i,5-i] 

\<l>z{t)\'\s^+'/'{Kit) - i)Pli - 4>H{st)\'dt. 

-5-1,5-1] 

Now change the integration variable in the last equality from i to s = (5„t and use 
the fact that for all s > 1 and for (5„ small enough by assumption on (f)z it holds 
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that \(t>z{s / 5n)\\s / 5n\^ < di, to conclude that the lefthand side of ([32]) is of order 
(5^"+^^. Selecting 5n >< n^'^/('^"+^f^) then yields p2l) . A similar argument works in 
case of p3|) . We also remark that the condition on cj)'^ give n in the statement of the 
theorem is needed to treat (|33|) . Application of Lemma 8 of iButucea and Tsybakov 
(J2008bl ) as in Theorem |4] concludes the proof. D 

Appendix A 

Lemma 1. Let p* < 1 and let png„ be defined by Q (with Png„ defined by ^). 
Under the same conditions as in Theorem,\^ (i), we have 

sup E [(p„g„ - pf] < „-(2"+l)/(2o+2/^)^ 

while under conditions of Theorem{l\ (ii) the inequality 

sup E[(p„,„ ^pf] < (logn)-(2"+i)/^ 

/«ES(Q,i<-i;),pG[0,p*] 

holds. 

Proof of Lemma{^ Introduce the notation supj ^ = supjg^fa.ii'sl.pGfo.p*] • Let n be 
so large that p* < 1 — e„, which is possible, because p* < 1 and e„ ], 0. Then 

This and Theorem [I] entail the desired result. D 

Lemma 2. Under the same conditions as in Theorem[l\ and provided e„ = (log3n)~^ 
the inequality 

(Png,, - P? 



sup E 

/«ES(Q,_fS-i;),pG[0,p*] 

holds. 

Proof. Introduce the sequence 



(l-p„,J2(l-p)2 



(34) 



^„ = 100V^sC/<! 1 - 



1//3 






1+1/2 



(log n) 



-1//3 



and notice that tpn = lOOV^sC^^ra in the supersmooth case, i.e in the setting 

of Theorem [T] (ii). The constants in the definition of ipn are rather arbitrary, but 
they suffice for our purposes. Notice that on the set {\png„ — p| < V'n} for all n 
large enough the inequality 

1 1 - Png„ I > 1 - P* - V'n 

holds, because ipn — > 0. We have 

{Png,,-P? 1 ro [ iPng^-pf 



E 



(1-Pnsj2(l-P)^ 



= E 

+ E 



rl[ 



Png„-p|<5/'n] 



(1-Pngj2(l-P) 

(P«g„ -pf . 
<E[(p„,„-p)2] 



-^P{\Png„ -P\> -0™) 
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< o2a + l 

where in the last inequahty we used Lemma [1] and Theorem [1] It is easy to see that 
for all / € S](a, Xs) and p S [0,p*] the constants in this chain of inequalities can 
be made independent of a particular / and a particular p. Then applying Lemma 
[3] and taking supremum over / € S(q;, -fCi;) and p € [0,P*] on the righthand side of 
the last equality establishes the desired result, because 



(^lp(|p„,„-p|>V„))=o(.92"+i) 
holds under our conditions on e„ and g„. D 



sup 



Lemma 3. Define the sequence ipn by (|34p and let e„ = (logSn)^^. Let png^ be 
defined by ^ (withpng^ defined by (HJj. Under the same conditions as in Theorem 
[7] (i) we have 



sup Pi\Png„ -P\> V'n) 

/eE(a,Ks),pe[0,p*] 

1 



< 



V'nfi'n 



exp I 



const X ngj^^ + exp [^ const! x tp^^gj nj 



while under those in Theorem\^ (ii) it holds that 

sup P{\Png^ -P\> ll^n) 

/GE(a,Ki;),pG[0,p*] 



^ " (-const X ne~2/('^S")] + exp (-const! x V'^e^2/(7S^)„^ 



< exp 



< ff^"e^ 



- 4,. 
for the case when /3o > 0, and 

sup P{\Pngr, -P\ > ^n) 

/es(a,i4:s),pe[o,p*] 

- exp {-const x n5^2/J"e"2/(79,.) 

+ exp f- const x i?ri9n^^°e~^'''^^^^^n\ 

for the case when jSq < 0. Here const and const! are some universal constants (not 
necessarily the same in all three cases) independent of particular n, p € [0,p*] and 
/eI](a,i^E). 

Proof. In this proof we continue numbering of the terms from the proof of Theorem 
[21 because it is the proof of Theorem [2] where this lemma finds its primary use. 
Observe that 

P(|P«ff„ ~P\> ^n) < P(|E[p„gJ -p| > V«/2) +P(|p„g„ -E[p„gJ| > 7/>„/2) 

= Tl5 + 2\6- 

We have 

|E [PngJ - P| < |E [P„gj -P\ + \E [P„g„ - P„gJ| 

< |E [p„gj -p\ + \E [(1 - e„ -P„gjl[p„,„>i-.„]]| 
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+ |E[(-l + e„-p„<,Jl[p„,^<_i+,„]]| 

+ E [|1 - e„ - Pn3„ |l[p„g„>l-e„]] 

+ E[\- l + e„-p„g,.|l[p„,„<-l+e„]] 

= 2^17 + 2^18 + 2^19- 

We put the study of Tn aside for a while and consider the other two terms. Since 
Ti8 and Tig can be studied in the similar manner, we consider only Tig. Our goal 
is to show that Tig (and by extension Tig) is negligible in comparison to Tij. We 
have 

The righthand side in both cases of the ordinary smooth or supersmooth Z is of 
smaller order t han Tiy, wh ich can be seen by employing the arguments on pp. 
1265-1266 from iFanI (jl99ir ) used to bound the integral on the righthand side of 



the above display and by the exponential bounds on P{png„ > 1 — £«), which 
we formulate separately in Lemma |4l With our conditions on gn these bounds 
imply that sup„ <■ Tig is of lower order than T17. The same is true for supp * Tig. 
As a consequence, sup^, j(Tig + Tig) < T17 for all n large enough. Thus T15 — 0, 
provided n is large enough, because T17 < i/'„/4 for all n large enough, and in fact 
supp J Ti5 = for all n large enough. 
It remains to study Tig. We have 

2^16 < P(|Png,. - Png^ I > ^„/4) + P(b„g„ - E [p„5 J | > V^„/4) 
< P(|Png„ - Png„ | > V'n/4) + P{\Png„ - E [pngj\ > ^n/S) 
+ P{\E[p„gJ-E[pngJ >V«/8) 
= T20 + T21 + T22. 

Notice that 

T20 < P(|l-en-Png„|l[p„,„>l-e„] > V'«/8) 

+ P(I - l + fn - P«g„ I l[p„g,. <-!+£„] > V'n/S). 

We consider e.g. the first term on the righthand side. It is bounded by 



1 f' \Mt)\ 

V'n V^ ^" ' 2 i_i \cj)z{t/9n)\ 



1 - e- + ^ / i.'TwM ^^ ) nPng^ > 1 - e„). 



Next, as we did abo ve, we use th e order bound on the integral on the righthand side, 
cf. pp. 1265-1266 in iFanI ( 1991I ). and the exponential bounds on P{png^ > 1 — e„) 



from ([55)1 and ([55)1 from Lemma [3] to bound the first term in the upper bound on 
T20. Similar reasoning applies to the second term in the upper bound on T20. There 
we use Lemma [HI These bounds give the first term on the righthand side of the 
three different formulae in the statement of the lemma. 

To bound T21, we apply the exponential inequalities from Lemma [HI The terms 
on the righthand side will then give the second terms in the three formulae on the 
righthand side in the statement of the lemma. 
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Finally, we turn to T22- Our goal is to show that there exists n' independent of 
p and /, such that for all n > n' we have T22 — 0. It holds that 

|JEb«g,J -E[p,igJ\ < E[|p„g„ - l + e„|l[p„g^>i„,„]] 

+ E[|p„3„ +l-en|l[p„g„<l-e„]]- 

As the arguments for both terms on the righthand side are similar, we consider 
only the first term. We have 



E[bnff„ -l + e„|l[p„g 



:>l-e„ 



< I 1 + e« + 77 



\Mt)\ 

I0z(i/3«)r 



dt P(Png„ > 1 - £„) 



By Lemmas S] and \S\ and the argument as on pp. 1265-1266 of iFanI (|1991I ) the 
righthand side is negligible compared to V'n and it follows that T22 is zero for all 
large enough n. In fact n' can be found, such that this holds true uniformly in p 
and / for all n > n'. Gathering all the above bounds entails the statement of the 
lemma. D 



Lemma 4. Let png^ be defined by ([T]). Under the conditions of Theorem,]^ (i) we 
have 



(35) sup P(p„g^ > 1 — e„) < exp (^— const x ng, 

pe[o,p*]:/es(a,Ks) 



n } ■> 



while under conditions of Theorenn\^ (ii) we have 
(36) 

fexp [-const x ne-^/Ciff^)"! ^ if Po> 0, 

sup P(Pnff„ > l^Cn) ^ S / I, f>-,\ 

pe[0:P*]:/GS(a,A's) I exp [ -const x ng~'^f^°e~'^'^'"^-^'] , if /3q < 0. 

Here const is a universal constant independent of particular n,p ^ [OjP*] o.'^d f ^ 
S(a,/^s). 



Proof. We have 



P(P. 



> 1 



P(Pr, 



E[p„g„] > l-e„-E[p„g„]) 



< P(|p„g„ - E b„<,J| > 1 - e„ - E [p„gj) 



E^"^"""^ 



where 



i=i 



Unix) 



E 



Ef^"^"""^ 



i=i 



> n 



(1 - e„ - E [p 




1 



e--^^di. 



27r7„i" (f)z{t/g„) 
Under the conditions of Theorem [1] (i) we have 

C 1 



\Un{x)\ < 



2-3^ 



while under those of Theorem [T] (ii) the inequality 
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holds. Here C, C and C" are some constants independent of n. By (fT8|) we have 

(37) |E [p„gj| < |E b„,J - p| +p < p* + -^yi^C/g^'/'- 
By taking tiq so large that for all n> tlq 

(38) p* + -^v^C/g«+i/2<l-e„ 

holds, one can ensure that uniformly in / and p, 1 — e„ — E \T)na„ ] > f or n > uq. 
Then by Hoeffding's inequality, see Lemma A. 4 on p. 198 of iTsybakovl ( 20091 ) . we 
obtain 

P(p„,„ > 1 - 6„) < 2exp (-2 ^^'^"'^^[P"^"J^' ngf^ 
for the setting of Theorem [T] (i), and 

J2exp (-2 ^^--"gS'"'""' "^"'^^^'"0 ' if/3o>0, 

P(P„,„ > 1 - ^») ^ J2exp (-2 ^^-Yc"/)^"-»\ g-^^oe-2/(.,g)-) ^ jf ^^ < q 

for the setting of Theorem [T] (ii) . Since 

(39) 1 - e„ - E [p„gj > 1 - e„ - p* - -^v/^C^3"+'/' > 



for all n large enough and uniformly in / and p, see (I37p . there exists a constant 
consi independent of n,p S [0,p*] and / G E(0, Ky)^ such that 

supP(p„g„ > 1 - £«) ^ exp {-const x ng^'^) 
p,/ 

for the setting of Theorem [1] (i) , and 

fexp (-const x ne-2/(79,'^)'j ^ if /3o > 0, 

supP(p„„ > 1 — e„) < < ^ / B \ 

pj 1 2 exp (-const x ng-2/3oe-2/(7ffO j ^ if /3o < 

for the setting of Theorem [1] (ii) . This concludes the proof. D 

Lemma 5. Let png^ be defined by ([T]). Under the conditions of Theorem]^ (i) we 

have 

sup P(Prig„ < — 1 + e„) < exp {—const x ng^'^) , 

pelo.pi./esCa.-R-s) 

w/iite under conditions of Theorem]^ (ii) we have 

fexpf-consix ne"2/(''9")) , if/3o>0, 

pe[o,P*],/es(a,KE) I exp {-const x ng:^-^P'>e ^/i'l's^M , i//3o < 0. 

i/ere const is a universal constant independent of particular n,p £ [0,p*] and f G 
S(a,ifs). 

Proof. The proof is analogous to the proof of Lemma|4]and is therefore omitted. D 

Lemma 6. Let png^ be defined by ([T]). Under the conditions of Theorem]^ (i) we 
have 

(40) sup P{\Png^ - E [pngj\ > ^n/8) < cxp {-const' X tplngf) , 

pe[o,p*,/eE(a,KE)] 
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while under conditions of Theorem]^ (ii) 

(41) sup P(b„g„-E [p„gj| > V«/8) < exp (-const' x ^Ine^/'^^a'A 

holds. Here const' is a universal constant independent oj particular n,p G [0,p*] 
and f G S(q:, K-^). 

Proof. These inequalities can be established by using HoefFding's inequality in the 
same way as the exponential bounds on P[png^ > 1 — e„) from Lemma 31 D 
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